Code
uri <- paste0(github_ames, "AmesHousing.csv")
df = read.csv(uri) # data.frame
dt <- fread(uri) # data.tableIn the Ames Housing dataset, which is commonly used for predicting housing prices, there are several
features that can significantly influence the sales price of a house. The importance of these features
can vary depending on the specific dataset and the machine learning algorithm used for analysis.
However, based on general observations and common practices,
the following features are often considered as strong predictors of housing prices:
1. Overall Quality: The overall quality of a house, usually measured on a scale from 1 to 10, is a crucial factor affecting its sales price. Higher-quality homes tend to command higher prices.
2. Above Ground Living Area: The size of the above ground living area, typically measured in square feet, is a strong indicator of a house’s value. Larger houses generally have higher prices.
3. Number of Bedrooms: The number of bedrooms in a house is an important factor for many buyers. Houses with more bedrooms are typically priced higher.
4. Number of Bathrooms: Similarly, the number of bathrooms in a house plays a significant role in determining its value. More bathrooms often lead to higher prices.
5. Lot Size: The size of the lot on which a house is situated can influence its price. Larger lots are generally associated with higher prices, especially in desirable locations.
6. Neighborhood: The neighborhood in which a house is located can have a significant impact on its value.
uri <- paste0(github_ames, "AmesHousing.csv")
df = read.csv(uri) # data.frame
dt <- fread(uri) # data.tableThe Ames Housing dataset contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, Iowa [IA] from 2006 to 2010.
The dataset has 2,930 observations with 82 variables. For a complete description of all included variables, please look at: https://rdrr.io/cran/AmesHousing/man/ames_raw.html.
Familiarize yourself with the data.
Provide a table with descriptive statistics for all included variables and check:
Classes of each of the variables (e.g. factors or continuous variables).
Descriptive/summary statistics for all continuous variables (e.g. mean, SD, range) and factor variables (e.g. frequencies).
Explore missing values: sapply(df, function(x) sum(is.na(x)))
dt %>%
setcolorder(c("Order", "SalePrice")) %>%
DT::datatable(
caption = "Table 1: Ames Housing dataset",
class = "compact stripe",
rownames = FALSE,
filter = 'top',
extensions = c('FixedColumns'),
options = list(
scrollX = TRUE,
fixedColumns = list(leftColumns = 2)
)
) %>%
formatCurrency("SalePrice", '\U0024', digits = 0) %>%
formatStyle(
'SalePrice',
color = "#003700",
fontWeight = "bold",
backgroundColor = '#FFFFF0',
backgroundSize = '100% 60%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
) %>%
formatStyle(
'Order',
color = '#C0C0C0',
backgroundColor = '#FFFFF0'
)str (no package needed)describe function (from the psych-package) for continuous variablestable function (base-R) for factor variables.# To check the structure of the data, you can use the "str"-command:
# str(dt)
# create a table with the type of the data
dt_str <-
dt[, lapply(.SD, typeof)] %>%
melt.data.table(
measure.vars = names(.),
variable.factor = FALSE) %>%
setorder(value, variable )
# display a summery per type
dt_str %>%
.[, .(count = .N), by = value] %>%
DT::datatable(
caption = "Table 2: Data structure summary",
class = "compact stripe",
rownames = FALSE,
options = list(
dom = "t"
)
) %>%
formatStyle(
"value",
color = "#370037",
backgroundColor = "#FFFFF0",
fontWeight = "bold"
)# display structure/type of the data
dt_str %>%
DT::datatable(
caption = "Table 3: Data structure and types",
class = "compact stripe",
rownames = FALSE,
filter = "top"
) %>%
formatStyle(
"variable",
color = "#370037",
backgroundColor = "#FFFFF0",
fontWeight = "bold"
)dt_chr <- dt_str[value == "character", variable]
dt_int <- dt_str[value == "integer", variable]All factor variables now have the ‘character’ class.
The following code helps to convert each character variable into a factor variable:
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], as.factor)
# str(df)
# convert character variables to factor variables
chr2fct <- function(x){
if(is.character(x))
as.factor(x)
else
x
}
# convert character variables to factor variables
# keep the integers
dt[, names(dt):= lapply(.SD, chr2fct)]
# display the factors and levels
str(dt[, ..dt_chr])Classes 'data.table' and 'data.frame': 2930 obs. of 43 variables:
$ Alley : Factor w/ 2 levels "Grvl","Pave": NA NA NA NA NA NA NA NA NA NA ...
$ Bldg Type : Factor w/ 5 levels "1Fam","2fmCon",..: 1 1 1 1 1 1 5 5 5 1 ...
$ Bsmt Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 4 6 6 6 6 6 6 6 6 6 ...
$ Bsmt Exposure : Factor w/ 5 levels "","Av","Gd","Mn",..: 3 5 5 5 5 5 4 5 5 5 ...
$ Bsmt Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 4 6 4 4 4 6 ...
$ BsmtFin Type 1: Factor w/ 7 levels "","ALQ","BLQ",..: 3 6 2 2 4 4 4 2 4 7 ...
$ BsmtFin Type 2: Factor w/ 7 levels "","ALQ","BLQ",..: 7 5 7 7 7 7 7 7 7 7 ...
$ Central Air : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ Condition 1 : Factor w/ 9 levels "Artery","Feedr",..: 3 2 3 3 3 3 3 3 3 3 ...
$ Condition 2 : Factor w/ 8 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Electrical : Factor w/ 6 levels "","FuseA","FuseF",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Exter Cond : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Exter Qual : Factor w/ 4 levels "Ex","Fa","Gd",..: 4 4 4 3 4 4 3 3 3 4 ...
$ Exterior 1st : Factor w/ 16 levels "AsbShng","AsphShn",..: 4 14 15 4 14 14 6 7 6 14 ...
$ Exterior 2nd : Factor w/ 17 levels "AsbShng","AsphShn",..: 11 15 16 4 15 15 6 7 6 15 ...
$ Fence : Factor w/ 4 levels "GdPrv","GdWo",..: NA 3 NA NA 3 NA NA NA NA NA ...
$ Fireplace Qu : Factor w/ 5 levels "Ex","Fa","Gd",..: 3 NA NA 5 5 3 NA NA 5 5 ...
$ Foundation : Factor w/ 6 levels "BrkTil","CBlock",..: 2 2 2 2 3 3 3 3 3 3 ...
$ Functional : Factor w/ 8 levels "Maj1","Maj2",..: 8 8 8 8 8 8 8 8 8 8 ...
$ Garage Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Garage Finish : Factor w/ 4 levels "","Fin","RFn",..: 2 4 4 2 2 2 2 3 3 2 ...
$ Garage Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Garage Type : Factor w/ 6 levels "2Types","Attchd",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Heating : Factor w/ 6 levels "Floor","GasA",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Heating QC : Factor w/ 5 levels "Ex","Fa","Gd",..: 2 5 5 1 3 1 1 1 1 3 ...
$ House Style : Factor w/ 8 levels "1.5Fin","1.5Unf",..: 3 3 3 3 6 6 3 3 3 6 ...
$ Kitchen Qual : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 3 1 5 3 3 3 3 3 ...
$ Land Contour : Factor w/ 4 levels "Bnk","HLS","Low",..: 4 4 4 4 4 4 4 2 4 4 ...
$ Land Slope : Factor w/ 3 levels "Gtl","Mod","Sev": 1 1 1 1 1 1 1 1 1 1 ...
$ Lot Config : Factor w/ 5 levels "Corner","CulDSac",..: 1 5 1 1 5 5 5 5 5 5 ...
$ Lot Shape : Factor w/ 4 levels "IR1","IR2","IR3",..: 1 4 1 4 1 1 4 1 1 4 ...
$ MS Zoning : Factor w/ 7 levels "A (agr)","C (all)",..: 6 5 6 6 6 6 6 6 6 6 ...
$ Mas Vnr Type : Factor w/ 6 levels "","BrkCmn","BrkFace",..: 6 5 3 5 5 3 5 5 5 5 ...
$ Misc Feature : Factor w/ 5 levels "Elev","Gar2",..: NA NA 2 NA NA NA NA NA NA NA ...
$ Neighborhood : Factor w/ 28 levels "Blmngtn","Blueste",..: 16 16 16 16 9 9 25 25 25 9 ...
$ Paved Drive : Factor w/ 3 levels "N","P","Y": 2 3 3 3 3 3 3 3 3 3 ...
$ Pool QC : Factor w/ 4 levels "Ex","Fa","Gd",..: NA NA NA NA NA NA NA NA NA NA ...
$ Roof Matl : Factor w/ 8 levels "ClyTile","CompShg",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Roof Style : Factor w/ 6 levels "Flat","Gable",..: 4 2 4 4 2 2 2 2 2 2 ...
$ Sale Condition: Factor w/ 6 levels "Abnorml","AdjLand",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Sale Type : Factor w/ 10 levels "COD","Con","ConLD",..: 10 10 10 10 10 10 10 10 10 10 ...
$ Street : Factor w/ 2 levels "Grvl","Pave": 2 2 2 2 2 2 2 2 2 2 ...
$ Utilities : Factor w/ 3 levels "AllPub","NoSeWa",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, ".internal.selfref")=<externalptr>
Create a table with the number of missing values per variable.
# sapply(df, function(x) sum(is.na(x)))
# table of missing values per variable
f_kbl_with_NA(dt) %>%
kable_styling(
full_width = FALSE,
position = "left",
htmltable_class = "lighttable-hover lighttable-condensed lightable-striped"
) | variable | value |
|---|---|
| Pool QC | 2917 |
| Misc Feature | 2824 |
| Alley | 2732 |
| Fence | 2358 |
| Fireplace Qu | 1422 |
| Lot Frontage | 490 |
| Garage Yr Blt | 159 |
| Garage Qual | 158 |
| Garage Cond | 158 |
| Garage Type | 157 |
| Garage Finish | 157 |
| Bsmt Qual | 79 |
| Bsmt Cond | 79 |
| Bsmt Exposure | 79 |
| BsmtFin Type 1 | 79 |
| BsmtFin Type 2 | 79 |
| Mas Vnr Area | 23 |
| Bsmt Full Bath | 2 |
| Bsmt Half Bath | 2 |
| BsmtFin SF 1 | 1 |
| BsmtFin SF 2 | 1 |
| Bsmt Unf SF | 1 |
| Total Bsmt SF | 1 |
| Garage Cars | 1 |
| Garage Area | 1 |
Create a table with descriptive statistics for all included variables.
For continuous variables, you can use the describe function (from the psych-package).
For factor variables, you can use the table function (base-R).
dt[, psych::describe(.SD), .SDcols = dt_int] %>%
as.data.table(keep.rownames = "cont_vars") %>%
DT::datatable(
caption = "Table 4: Describe numerics",
class = "stripe",
rownames = FALSE,
filter = "top",
extensions = c('FixedColumns'),
options = list(
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)
)
) %>%
formatStyle(
"cont_vars",
color = "#370037",
backgroundColor = "#FFFFF0",
fontWeight = "bold"
)my_cnt <-
function(x){
data.table(col = x) %>%
.[, .(cnt = .N), by = col]
}
dt<=dtb
dt[, (names(dt)) := lapply(.SD, as.factor), .SDcols = sapply(dt, is.character)]
# Reshape the data.table into long format
cols <- sapply(dt, is.factor) %>% .[.==TRUE]
dt6 <- dt[, ..cols]
dt_long <- melt(dt6, measure.vars = names(dt6), variable.name = "Column")
# Create bar chart for each column
ggplot(dt_long, aes(x = fct_infreq(value))) +
geom_bar() +
facet_wrap(~Column, scales = "free_x") +
labs(x = "Value", y = "Count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
tst<- lapply(dt5, my_cnt)
dt[, .()]
ggplot( mapping = aes(x = f, y = cnt)) +
geom_col() +
coord_flip() +
facet_wrap(facets = vars(c), scales = "free")
temp <-
df %>%
purrr::keep(is.factor)
for (i in 1:ncol(temp)) {
print(names(temp[i]))
print(table(temp[, i]))
}There a several missing values in the dataset, which need to be tackled before we can proceed with the rest of the analysis.
There are many ways to impute missing values, but for now, impute missing values for numeric variables with the median, and impute missings in all factor variables with the label “100”.
# impute NA with median for all numeric variables
dt[, (dt_int) := lapply(.SD, function(x){
ifelse(is.na(x), median(x, na.rm=T), x)}), .SDcols = dt_int]
# table of missing values per variable
f_kbl_with_NA(dt)| variable | value |
|---|---|
| Pool QC | 2917 |
| Misc Feature | 2824 |
| Alley | 2732 |
| Fence | 2358 |
| Fireplace Qu | 1422 |
| Garage Qual | 158 |
| Garage Cond | 158 |
| Garage Type | 157 |
| Garage Finish | 157 |
| Bsmt Qual | 79 |
| Bsmt Cond | 79 |
| Bsmt Exposure | 79 |
| BsmtFin Type 1 | 79 |
| BsmtFin Type 2 | 79 |
df <-
lapply(df, function(x) {
### Impute median for all missing numeric values
if(is.numeric(x)) ifelse(is.na(x), median(x, na.rm=T), x) else x
}
) %>%
data.frame()# generate a vector with variable names for all factor variables
factor_variables <-
df %>%
keep(is.factor) %>%
names
# impute missing values for factor variables
df<-
lapply(df,function(x) {
if(is.factor(x)) ifelse(is.na(x),"100",x) else x
}) %>%
data.frame()
# 100 imputation for factor variables
dt[, (dt_chr) := lapply(.SD, function(x) {
ifelse(is.na(x), "100", as.character(x))
}), .SDcols = dt_chr]
# convert factor variables back to factor variables
# (imputation turned them into character variables)
df[factor_variables] <- lapply(df[factor_variables], factor)
dt[, (dt_chr) := lapply(.SD, as.factor), .SDcols = dt_chr]# sapply(df, function(x) sum(is.na(x)))
# table of missing values per variable
f_kbl_with_NA(dt)| variable | value |
|---|---|
Explore the outcome variable (SalePrice) and how it correlates to other features
The variable “SalePrice” refers to the price at which a property was sold and hence is the variable of interest for our prediction model (“Y” or dependent variable).
Please explore Y in terms of:
Visualize the distribution of Y (e.g. use base-R “hist” or “ggplot” from the “ggplot2”-package)
Visualize the distribution of Y by looking at various subgroups
(e.g. create boxplot or scatterplot using the “ggplot2”-package).
Look at differences between neighborhoods.
Look at differences between housing style.
Draw a correlation plot to see all correlations between Y and the independent (numeric) variables.
For visualization, ggplot is frequently used as it provides a flexible way to draw a lot of different graphs.
ggplot contains two basic elements:
The initiation command:
ggplot(DATASET, aes(x=XVAR, y=YVAR, group=XVAR))
This draws a blank ggplot. Even though the x and y are specified, there are no points or lines in it.
Add the respective geom of interest (for this exercise you’ll need:
+ geom_point() (for scatterplot) or
+ geom_boxplot()
The full code to write a scatter plot would then be:
ggplot(DATASET, aes(x=XVAR, y=YVAR)) + geom_point()
To draw a correlation plot. Please use the “corrplot”-package.
Using this package, one can construct a correlation plot in two steps:
Use “cor” to calculate correlation between all combinations of numeric variables
select numeric variables by using: df %>% keep(is.numeric)
Plot the calculated correlation by using the corrplot -function
# Descriptive/summary statistics (e.g. mean, SDs, range)
dt$SalePrice %>%
psych::describe() %>%
t() %>%
as.data.table(
keep.rownames = "stat") %>%
.[, .(stat,
SalesPrice = X1)] %>%
kbl(
digits = 0,
caption = "Table 5: Descriptive statistics for Sales Price",
format.args = list(big.mark = ","),
align = 'l'
) %>%
kable_styling(
full_width = FALSE,
position = "left",
htmltable_class = "lighttable-hover lighttable-condensed lightable-striped") | stat | SalesPrice |
|---|---|
| vars | 1 |
| n | 2,930 |
| mean | 180,796 |
| sd | 79,887 |
| median | 160,000 |
| trimmed | 170,429 |
| mad | 54,856 |
| min | 12,789 |
| max | 755,000 |
| range | 742,211 |
| skew | 2 |
| kurtosis | 5 |
| se | 1,476 |
# Visualize the distribution of Y
# (e.g. use base-R "hist" or "ggplot" from the "ggplot2"-package)
hist(dt$SalePrice)ggplot(data = dt, aes(SalePrice)) +
geom_histogram(fill = "#370037", color = "#FFFFF0", bins = 18) +
# scale_x_continuous(limits = c(0,600000), expand = c(0, 0)) +
# scale_y_continuous(limits = c(0,650) , expand = c(0, 0)) +
labs(title = "Histogram of Sale Price") +
ylab(label = "Count") +
xlab(label = "Sale Price") +
# theme_classic() +
theme(
axis.title.x = element_text(
colour = "#370037", size = 11.5, face = "bold"),
axis.title.y = element_text(
colour = "#370037", size = 11.5, face = "bold"),
plot.title = element_text(
colour = "#370037", size = 18 , face = "bold", hjust = 0)
) # Visualize the distribution of Y by looking at various subgroups
# (e.g. create boxplot or scatterplot using the "ggplot2"-package)
# Scatterplot
ggplot(data = dt, aes(x = `Lot Area`, y = SalePrice)) +
geom_point(size = .7, color = "#370037") +
scale_x_continuous(limits = c(0, 50000) , expand = c(0, 0)) +
scale_y_continuous(limits = c(0, 600000), expand = c(0, 0)) +
labs(title = "Scatterplot Sale Price by Lot Area") +
ylab(label = "Sale Price") +
xlab(label = "Lot area") +
# theme_classic() +
theme(
axis.title.x = element_text(
colour = "#370037", size = 11.5, face = "bold"),
axis.title.y = element_text(
colour = "#370037", size = 11.5, face = "bold"),
plot.title = element_text(
colour = "#370037", size = 18 , face = "bold", hjust = 0))Warning: Removed 20 rows containing missing values (`geom_point()`).
# Boxplot
dt[, avgSP := mean(SalePrice), by = Neighborhood] %>%
.[, Neighborhood := fct_reorder(Neighborhood, avgSP)] %>%
.[, avgSP := NULL] %>%
ggplot(aes(x = Neighborhood, y = SalePrice)) +
geom_boxplot(color = "#370037", fill = "#FFFFF0") +
labs(title = "Boxplot Sale Price by Neighbourhood") +
ylab(label = "Sale Price") +
xlab(label = "Neighbourhood") +
# theme_classic() +
theme(
axis.title.x = element_text(
colour = "#370037", size = 11.5, face = "bold"),
axis.title.y = element_text(
colour = "#370037", size = 11.5, face = "bold"),
plot.title = element_text(
colour = "#370037", size = 18 , face = "bold", hjust = 0),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
)Box-plots are sorted by the mean of the dependent variable (SalePrice). The mean of the dependent variable is calculated for each level of the independent variable (House Style).
The levels of the independent variable are reordered based on the mean of the dependent variable.
#|label: Look at differences between housing style
dt[, avgHS := mean(SalePrice), by = `House Style`] %>%
.[, `House Style` := fct_reorder(`House Style`, avgHS)] %>%
.[, avgHS := NULL] %>%
ggplot(aes(x = `House Style`, y = SalePrice)) +
geom_boxplot(color = "#370037", fill = "#FFFFF0") +
labs(title = "Boxplot Sale Price by House Style") +
ylab(label = "Sale Price") +
xlab(label = "House Style") +
# theme_classic() +
theme(
axis.title.x = element_text(
colour = "#370037", size = 11.5, face = "bold"),
axis.title.y = element_text(
colour = "#370037", size = 11.5, face = "bold"),
plot.title = element_text(
colour = "#370037", size = 18 , face = "bold", hjust = 0),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
)# corr_df <-
# df %>%
# keep(is.numeric) %>%
# cor
corr_dt <-
dt[, ..dt_int] %>%
cor(
use = "everything",
method = "pearson"
)
corrplot(
corr = corr_dt,
type = "upper",
title = "Correlation between all numeric variables in the dataset",
diag = FALSE,
order = 'hclust',
hclust.method = 'median',
addrect = 3,
number.font = 2,
tl.cex = 0.50,
mar = c(0, 0, 1, 0)
)corr_dt[, "SalePrice"] %>%
as.data.table(
keep.rownames = "var",
check.names = FALSE
) %>%
setnames(".", "corr") %>%
setorder(-corr) %>%
kbl()| var | corr |
|---|---|
| SalePrice | 1.0000000 |
| Overall Qual | 0.7992618 |
| Gr Liv Area | 0.7067799 |
| Garage Cars | 0.6478115 |
| Garage Area | 0.6403811 |
| Total Bsmt SF | 0.6321639 |
| 1st Flr SF | 0.6216761 |
| Year Built | 0.5584261 |
| Full Bath | 0.5456039 |
| Year Remod/Add | 0.5329738 |
| Garage Yr Blt | 0.5088825 |
| Mas Vnr Area | 0.5021960 |
| TotRms AbvGrd | 0.4954744 |
| Fireplaces | 0.4745581 |
| BsmtFin SF 1 | 0.4328618 |
| Lot Frontage | 0.3402558 |
| Wood Deck SF | 0.3271432 |
| Open Porch SF | 0.3129505 |
| Half Bath | 0.2850560 |
| Bsmt Full Bath | 0.2758227 |
| 2nd Flr SF | 0.2693734 |
| Lot Area | 0.2665492 |
| Bsmt Unf SF | 0.1828955 |
| Bedroom AbvGr | 0.1439134 |
| Screen Porch | 0.1121512 |
| Pool Area | 0.0684032 |
| Mo Sold | 0.0352588 |
| 3Ssn Porch | 0.0322246 |
| BsmtFin SF 2 | 0.0060176 |
| Misc Val | -0.0156915 |
| Yr Sold | -0.0305691 |
| Order | -0.0314079 |
| Bsmt Half Bath | -0.0358166 |
| Low Qual Fin SF | -0.0376598 |
| MS SubClass | -0.0850916 |
| Overall Cond | -0.1016969 |
| Kitchen AbvGr | -0.1198137 |
| Enclosed Porch | -0.1287874 |
| PID | -0.2465212 |
Now that we have a better feeling of the information in the data set and we took care of the missing values, we can start by running some (additional) simple machine learning models.
We will use the “caret”-package for this exercise. Split the data randomly into a train set (70%) and test set (30%)
set.seed(1234)
# use the createDataPartition function to split the data
# from the caret package
trainIndex <-
createDataPartition(dt$Order, p = 0.7, list = FALSE)
train <- dt[ trainIndex, ]
test <- dt[-trainIndex, ]
# dt <-
# nrow(df) %>%
# sample(. * .7) %>%
# sort()
# train <- df[ dt,]
# test <- df[-dt,]Next we need to specify how we want to perform the cross-validation (i.e. the optimization of the model on the train set). To this extend we need to set the method of CV, the number of folds and the numer of times we want to repeat the process. We will use the “repeatedcv” method, with 10 folds and 3 repeats.
# Cross-validation strategy from the caret package
ctrl <-
trainControl(
method = "repeatedcv",
number = 5, # ten folds
repeats = 3) # repeated three timesnow you are an teacher in data science and will teach me how to do linear regression on ames housing data set.
# Fit the linear regression model on the training data
model <- lm(SalePrice ~ ., data = train)
# View the summary of the model
summary(model)
Call:
lm(formula = SalePrice ~ ., data = train)
Residuals:
Min 1Q Median 3Q Max
-108623 -9117 0 8723 137092
Coefficients: (14 not defined because of singularities)
Estimate Std. Error t value
(Intercept) 2722842.956669620 8294573.794499564 0.328
Order -2.354354717 6.543277522 -0.360
PID 0.000009718 0.000008334 1.166
`MS SubClass` -89.230780658 52.948346474 -1.685
`MS Zoning`C (all) -26228.322187269 38588.232181061 -0.680
`MS Zoning`FV -19606.709547030 38128.940642222 -0.514
`MS Zoning`RH -11315.575922059 38835.291647684 -0.291
`MS Zoning`RL -14479.936096872 37977.771903572 -0.381
`MS Zoning`RM -20180.104932288 38211.356913615 -0.528
`Lot Frontage` 96.798438411 32.332286531 2.994
`Lot Area` 0.824261413 0.084381732 9.768
StreetPave 26149.097525122 7575.464780134 3.452
AlleyGrvl -779.514062341 2809.153597201 -0.277
AlleyPave -2801.109060433 3408.925476162 -0.822
`Lot Shape`IR2 274.175879254 2929.662350538 0.094
`Lot Shape`IR3 -7136.147928025 7093.231588784 -1.006
`Lot Shape`Reg 317.543261399 1139.555980796 0.279
`Land Contour`HLS 3708.739706788 3618.749845858 1.025
`Land Contour`Low -8729.943381635 4758.852916513 -1.834
`Land Contour`Lvl -169.756257071 2631.606799095 -0.065
UtilitiesNoSeWa -8819.382867195 21285.266717197 -0.414
`Lot Config`CulDSac 5786.975358945 2372.863806098 2.439
`Lot Config`FR2 -4328.339150263 3020.144737983 -1.433
`Lot Config`FR3 -10303.598531211 6215.381414872 -1.658
`Lot Config`Inside 1956.164963018 1271.577733944 1.538
`Land Slope`Mod 5520.741930052 2849.930935904 1.937
`Land Slope`Sev -46017.255842182 9399.226708871 -4.896
NeighborhoodIDOTRR 1106.277815886 6547.326129450 0.169
NeighborhoodBrDale 18189.235624204 7892.965183201 2.304
NeighborhoodOldTown -2577.624845734 6356.073854917 -0.406
NeighborhoodBrkSide 10806.247884026 6549.638262048 1.650
NeighborhoodEdwards -2435.764495165 6103.998683332 -0.399
NeighborhoodSWISU 3071.274810076 7139.506504979 0.430
NeighborhoodSawyer 6057.487417378 6346.269123954 0.954
NeighborhoodNPkVill 20612.830498695 12989.664652203 1.587
NeighborhoodBlueste 15750.811918154 8800.657644940 1.790
NeighborhoodNAmes 2676.414363873 6522.336145101 0.410
NeighborhoodMitchel -5322.402306245 6327.534662866 -0.841
NeighborhoodSawyerW -564.163054828 6713.689160188 -0.084
NeighborhoodNWAmes -1384.135669945 7016.073774639 -0.197
NeighborhoodGilbert 2264.552397057 7247.853622363 0.312
NeighborhoodGreens 22927.927424497 13443.163357407 1.706
NeighborhoodBlmngtn 14474.049918767 8404.423343216 1.722
NeighborhoodCollgCr -145.460969071 6310.469156546 -0.023
NeighborhoodCrawfor 17200.163569459 6375.444025827 2.698
NeighborhoodClearCr -3336.471130602 7486.977046638 -0.446
NeighborhoodSomerst 18959.703695724 7863.681523745 2.411
NeighborhoodTimber -5103.520241003 6883.019410273 -0.741
NeighborhoodVeenker 12526.976825937 8470.689937306 1.479
NeighborhoodGrnHill 129087.345185478 20688.613379522 6.240
NeighborhoodNridgHt 21259.309839189 7337.921313934 2.897
NeighborhoodStoneBr 42301.435838708 7929.709337041 5.335
NeighborhoodNoRidge 31846.813486712 7751.978028915 4.108
`Condition 1`Feedr 2598.738052574 3494.115107048 0.744
`Condition 1`Norm 11692.346061837 2925.448236520 3.997
`Condition 1`PosA 8904.028569713 6498.244364544 1.370
`Condition 1`PosN 16603.155508856 4855.223225234 3.420
`Condition 1`RRAe 1964.537654860 5589.354533697 0.351
`Condition 1`RRAn 7045.312425976 5047.791820708 1.396
`Condition 1`RRNe 21185.094367610 12136.496885941 1.746
`Condition 1`RRNn 2086.976947011 8397.649166001 0.249
`Condition 2`Feedr 8793.050175287 14413.460922751 0.610
`Condition 2`Norm 18276.807563843 12820.603489534 1.426
`Condition 2`PosA 49380.462827279 18138.351806963 2.722
`Condition 2`PosN -448927.555945453 24634.729600699 -18.223
`Condition 2`RRAe 80503.431450650 42143.194131818 1.910
`Condition 2`RRAn 24313.417104288 23752.418035582 1.024
`Condition 2`RRNn 17647.783311428 23941.878301115 0.737
`Bldg Type`2fmCon 1319.243771936 8342.965911498 0.158
`Bldg Type`Duplex -19644.795445089 5163.398157855 -3.805
`Bldg Type`Twnhs -13342.521724962 6459.641892646 -2.066
`Bldg Type`TwnhsE -7355.193650511 5808.923589459 -1.266
`House Style`1.5Fin -9047.190950866 5745.395109022 -1.575
`House Style`SFoyer -3996.338047150 6606.393223949 -0.605
`House Style`SLvl -6329.199418709 6398.536923481 -0.989
`House Style`2.5Unf -6343.840652995 8112.321905675 -0.782
`House Style`1Story -3669.309390719 5583.852040347 -0.657
`House Style`2Story -7588.661729454 6238.365621746 -1.216
`House Style`2.5Fin -32329.354079245 11975.375543752 -2.700
`Overall Qual` 6985.797772780 691.511721395 10.102
`Overall Cond` 4408.326037234 597.805936786 7.374
`Year Built` 282.147007874 54.675948968 5.160
`Year Remod/Add` 122.626569337 37.292449547 3.288
`Roof Style`Gable 2120.941638087 10891.565244855 0.195
`Roof Style`Gambrel 1598.904587966 12087.604411462 0.132
`Roof Style`Hip 3137.767433601 10966.052796197 0.286
`Roof Style`Mansard 9436.433982878 14118.240634439 0.668
`Roof Style`Shed -83119.485453801 23674.954060902 -3.511
`Roof Matl`CompShg 855789.832979338 37977.626804734 22.534
`Roof Matl`Membran 926076.498151208 46103.954435478 20.087
`Roof Matl`Metal 917046.769086387 45791.379438357 20.027
`Roof Matl`Roll 850205.635940507 42990.184702002 19.777
`Roof Matl`Tar&Grv 848203.509461423 39194.549351860 21.641
`Roof Matl`WdShake 841573.272948880 40433.424683448 20.814
`Roof Matl`WdShngl 881467.830816248 40278.874165329 21.884
`Exterior 1st`AsphShn -9668.478857521 26352.942954093 -0.367
`Exterior 1st`BrkComm 3732.852629451 14141.358287826 0.264
`Exterior 1st`BrkFace 20031.931314214 9166.129667515 2.185
`Exterior 1st`CBlock -14190.167837977 31688.926596766 -0.448
`Exterior 1st`CemntBd -20272.448413724 16611.654612515 -1.220
`Exterior 1st`HdBoard 1855.735341950 8918.892979753 0.208
`Exterior 1st`ImStucc -6277.165692021 22613.635405549 -0.278
`Exterior 1st`MetalSd 5712.849874965 10040.354406210 0.569
`Exterior 1st`Plywood 3680.797451543 8704.964253197 0.423
`Exterior 1st`Stone -13639.391426253 25730.860755108 -0.530
`Exterior 1st`Stucco 7232.509029519 9930.814059556 0.728
`Exterior 1st`VinylSd -2023.874955067 9922.513493812 -0.204
`Exterior 1st`Wd Sdng -736.736591844 8606.391712483 -0.086
`Exterior 1st`WdShing 3110.937578872 9364.541589119 0.332
`Exterior 2nd`AsphShn -418.897060780 17717.368142020 -0.024
`Exterior 2nd`Brk Cmn -4135.443188971 14542.725070649 -0.284
`Exterior 2nd`BrkFace -8621.709018825 10320.916322863 -0.835
`Exterior 2nd`CBlock 13234.566153548 24163.997539080 0.548
`Exterior 2nd`CmentBd 27663.002190387 16912.675940046 1.636
`Exterior 2nd`HdBoard -5126.965000254 9427.455818355 -0.544
`Exterior 2nd`ImStucc -9735.888493941 11445.697298541 -0.851
`Exterior 2nd`MetalSd -3633.012830586 10463.589625969 -0.347
`Exterior 2nd`Plywood -4803.916757938 9064.724781964 -0.530
`Exterior 2nd`Stone -5829.028018905 15942.667035449 -0.366
`Exterior 2nd`Stucco 1581.461334555 10235.349930131 0.155
`Exterior 2nd`VinylSd 1828.587648856 10381.935808798 0.176
`Exterior 2nd`Wd Sdng 963.959307046 9172.397870184 0.105
`Exterior 2nd`Wd Shng -4515.193919337 9625.418533364 -0.469
`Mas Vnr Type`BrkCmn -14050.026907400 7699.213453026 -1.825
`Mas Vnr Type`BrkFace -10762.465705382 5790.065282801 -1.859
`Mas Vnr Type`CBlock NA NA NA
`Mas Vnr Type`None -4932.339002810 5678.233202331 -0.869
`Mas Vnr Type`Stone -3749.094596337 5841.729575660 -0.642
`Mas Vnr Area` 31.637822857 4.273496000 7.403
`Exter Qual`Fa -16692.489960352 6364.111655304 -2.623
`Exter Qual`Gd -21679.696035971 3367.752401530 -6.437
`Exter Qual`TA -21435.404329214 3792.789023081 -5.652
`Exter Cond`Fa 4078.163825959 11408.176919711 0.357
`Exter Cond`Gd 12689.227765807 10864.622310411 1.168
`Exter Cond`Po 16798.111661605 23880.429666943 0.703
`Exter Cond`TA 11156.028658736 10828.373842641 1.030
FoundationCBlock 41.616373597 2137.154287330 0.019
FoundationPConc 2246.074104370 2337.358355842 0.961
FoundationSlab -5931.908153480 6345.888078545 -0.935
FoundationStone 9277.777425305 7948.412367629 1.167
FoundationWood -887.422673468 10452.445493180 -0.085
`Bsmt Qual`100 49407.998616638 31703.657190322 1.558
`Bsmt Qual`Ex 11079.999012216 42466.402385232 0.261
`Bsmt Qual`Fa -94.474606214 42383.430183596 -0.002
`Bsmt Qual`Gd -6108.970356263 42383.196021625 -0.144
`Bsmt Qual`Po 38679.798577053 49727.446033519 0.778
`Bsmt Qual`TA -4994.883523941 42374.722099308 -0.118
`Bsmt Cond`100 NA NA NA
`Bsmt Cond`Ex -9854.875297349 14314.007457219 -0.688
`Bsmt Cond`Fa 762.497837664 2798.521714846 0.272
`Bsmt Cond`Gd -4071.439594172 2380.599024824 -1.710
`Bsmt Cond`Po 11501.652324216 10843.576863955 1.061
`Bsmt Cond`TA NA NA NA
`Bsmt Exposure`100 NA NA NA
`Bsmt Exposure`Av 20363.104738245 19593.917019656 1.039
`Bsmt Exposure`Gd 29903.626078071 19689.424245145 1.519
`Bsmt Exposure`Mn 12019.222679463 19621.025170792 0.613
`Bsmt Exposure`No 12963.414782307 19580.708779469 0.662
`BsmtFin Type 1`100 NA NA NA
`BsmtFin Type 1`ALQ -1605.927596581 2003.703086248 -0.801
`BsmtFin Type 1`BLQ -1203.706729506 2179.325535692 -0.552
`BsmtFin Type 1`GLQ 3150.370270406 1951.130891662 1.615
`BsmtFin Type 1`LwQ -7274.281117164 2509.730221646 -2.898
`BsmtFin Type 1`Rec -4837.234511168 2091.074014386 -2.313
`BsmtFin Type 1`Unf NA NA NA
`BsmtFin SF 1` 41.186121706 3.453958339 11.924
`BsmtFin Type 2`100 NA NA NA
`BsmtFin Type 2`ALQ 28547.015077716 20228.366129244 1.411
`BsmtFin Type 2`BLQ 18931.493105119 20161.545583787 0.939
`BsmtFin Type 2`GLQ 35556.098410121 20598.373875034 1.726
`BsmtFin Type 2`LwQ 17411.910653933 20176.218782885 0.863
`BsmtFin Type 2`Rec 17065.139193048 20127.630214292 0.848
`BsmtFin Type 2`Unf 21069.658780062 20096.933564336 1.048
`BsmtFin SF 2` 34.615079979 5.677438059 6.097
`Bsmt Unf SF` 22.740037689 3.114180916 7.302
`Total Bsmt SF` NA NA NA
HeatingGasA 8526.780004619 20737.157287672 0.411
HeatingGasW 3752.806340596 21502.285219862 0.175
HeatingGrav 3731.546146285 22659.690025587 0.165
HeatingOthW -35071.243383306 29449.914928764 -1.191
HeatingWall 16791.882667122 25766.157942165 0.652
`Heating QC`Fa -2458.207026243 3155.263148427 -0.779
`Heating QC`Gd -750.526955236 1425.584240472 -0.526
`Heating QC`Po -10634.538037207 15109.540405587 -0.704
`Heating QC`TA -2990.856894739 1412.656029074 -2.117
`Central Air`Y -3077.394525350 2561.791821673 -1.201
ElectricalFuseA -19889.850452420 19833.479454012 -1.003
ElectricalFuseF -19958.223963087 20103.876943605 -0.993
ElectricalFuseP -21298.514561003 21566.820214314 -0.988
ElectricalMix 1961.490182812 34374.256398163 0.057
ElectricalSBrkr -18884.309028053 19737.972131556 -0.957
`1st Flr SF` 49.257726885 3.640020483 13.532
`2nd Flr SF` 61.816616479 3.778960634 16.358
`Low Qual Fin SF` 44.178429771 11.985204829 3.686
`Gr Liv Area` NA NA NA
`Bsmt Full Bath` 927.513953582 1316.818316961 0.704
`Bsmt Half Bath` -2141.467007572 2067.068158218 -1.036
`Full Bath` 4376.890833620 1492.497364222 2.933
`Half Bath` 1580.357066467 1399.458172771 1.129
`Bedroom AbvGr` -4943.726904236 941.242618845 -5.252
`Kitchen AbvGr` -8713.852102296 4148.797280807 -2.100
`Kitchen Qual`Fa -19650.635353892 4235.264309327 -4.640
`Kitchen Qual`Gd -18986.749407913 2499.766481978 -7.595
`Kitchen Qual`Po 16128.814658408 21493.948968265 0.750
`Kitchen Qual`TA -19502.220557043 2776.217374602 -7.025
`TotRms AbvGrd` 710.294676933 633.048643669 1.122
FunctionalMaj2 -8881.430314149 10240.094641717 -0.867
FunctionalMin1 9343.164555113 6600.809966892 1.415
FunctionalMin2 8186.567176624 6615.544902465 1.237
FunctionalMod 1608.120627836 7353.275539844 0.219
FunctionalSev -61489.187898232 23683.153489748 -2.596
FunctionalTyp 18049.857950913 5882.310609132 3.068
Fireplaces 5305.975131971 1751.691597254 3.029
`Fireplace Qu`Ex 1659.256353700 5232.735747007 0.317
`Fireplace Qu`Fa -5538.301068032 3476.844987211 -1.593
`Fireplace Qu`Gd -4705.021176442 2420.045286792 -1.944
`Fireplace Qu`Po 418.463920052 4075.210836543 0.103
`Fireplace Qu`TA -4990.958579089 2500.316257370 -1.996
`Garage Type`2Types -49232.401952557 21873.008991288 -2.251
`Garage Type`Attchd -42177.911752276 21188.239489271 -1.991
`Garage Type`Basment -43763.506241151 21749.294434149 -2.012
`Garage Type`BuiltIn -36290.775944173 21315.831729980 -1.703
`Garage Type`CarPort -32239.520533962 22169.281230979 -1.454
`Garage Type`Detchd -38553.400289948 21146.807394215 -1.823
`Garage Yr Blt` 15.767065415 38.595297896 0.409
`Garage Finish`100 NA NA NA
`Garage Finish`Fin 30416.092542911 20927.527912696 1.453
`Garage Finish`RFn 28123.728672697 20935.176937734 1.343
`Garage Finish`Unf 31079.112661281 20900.010528160 1.487
`Garage Cars` 2686.262377558 1552.218996329 1.731
`Garage Area` 19.428739021 5.273638938 3.684
`Garage Qual`100 NA NA NA
`Garage Qual`Ex 92404.419674972 18300.489234965 5.049
`Garage Qual`Fa 717.708893831 2820.054899160 0.255
`Garage Qual`Gd 15367.396339833 5725.752240138 2.684
`Garage Qual`Po -26649.651181852 16644.228344952 -1.601
`Garage Qual`TA NA NA NA
`Garage Cond`100 NA NA NA
`Garage Cond`Ex -68620.542576261 17011.897702465 -4.034
`Garage Cond`Fa -1106.833205324 3559.172328606 -0.311
`Garage Cond`Gd -4795.440885857 5987.076422230 -0.801
`Garage Cond`Po 8812.847211582 8576.909288945 1.028
`Garage Cond`TA NA NA NA
`Paved Drive`P -6751.442075623 3498.850233545 -1.930
`Paved Drive`Y 924.331453944 2293.184015586 0.403
`Wood Deck SF` 5.015378248 4.004434545 1.252
`Open Porch SF` 5.715202525 7.797118393 0.733
`Enclosed Porch` 8.775678283 8.174292439 1.074
`3Ssn Porch` -11.585333979 17.433022599 -0.665
`Screen Porch` 38.919130253 8.585508735 4.533
`Pool Area` -514.976212986 75.841420952 -6.790
`Pool QC`Ex 182691.160885135 24824.395301088 7.359
`Pool QC`Fa 357474.854238262 52732.260144411 6.779
`Pool QC`Gd 414509.067440474 60407.692414329 6.862
`Pool QC`TA 254730.592373981 40587.829374977 6.276
FenceGdPrv 222.326363771 2507.881317324 0.089
FenceGdWo 2527.434989544 2418.603489599 1.045
FenceMnPrv 2165.307775202 1498.393049991 1.445
FenceMnWw -3109.017454244 6403.388622226 -0.486
`Misc Feature`Elev -574161.218420959 50213.282397723 -11.434
`Misc Feature`Gar2 -396.345678670 45312.627122451 -0.009
`Misc Feature`Othr 23124.793231749 13261.949434841 1.744
`Misc Feature`Shed -1216.271427454 3487.856047627 -0.349
`Misc Val` 0.183374609 2.650889752 0.069
`Mo Sold` -94.741267271 171.159155761 -0.554
`Yr Sold` -2228.017128617 4125.948325317 -0.540
`Sale Type`Con 41579.666700293 12551.867811256 3.313
`Sale Type`ConLD 3571.734446365 6320.640160031 0.565
`Sale Type`ConLI -1480.295048490 9333.317303968 -0.159
`Sale Type`ConLw 9717.027418904 9713.828504796 1.000
`Sale Type`CWD 14232.486325140 7593.407646463 1.874
`Sale Type`New 19406.757044242 11340.696154597 1.711
`Sale Type`Oth 4747.173097708 8005.022606619 0.593
`Sale Type`VWD -4455.477341487 20285.683843424 -0.220
`Sale Type`WD 4049.339376418 2865.361098598 1.413
`Sale Condition`AdjLand 20872.917483362 7847.212139585 2.660
`Sale Condition`Alloca 23094.835230651 6037.808019010 3.825
`Sale Condition`Family 3175.890906950 4014.424311894 0.791
`Sale Condition`Normal 6447.061273910 2086.665882867 3.090
`Sale Condition`Partial 7044.612680626 10913.483654456 0.645
Pr(>|t|)
(Intercept) 0.742747
Order 0.719030
PID 0.243744
`MS SubClass` 0.092116 .
`MS Zoning`C (all) 0.496784
`MS Zoning`FV 0.607161
`MS Zoning`RH 0.770799
`MS Zoning`RL 0.703045
`MS Zoning`RM 0.597483
`Lot Frontage` 0.002792 **
`Lot Area` < 0.0000000000000002 ***
StreetPave 0.000570 ***
AlleyGrvl 0.781435
AlleyPave 0.411358
`Lot Shape`IR2 0.925448
`Lot Shape`IR3 0.314527
`Lot Shape`Reg 0.780542
`Land Contour`HLS 0.305564
`Land Contour`Low 0.066751 .
`Land Contour`Lvl 0.948574
UtilitiesNoSeWa 0.678673
`Lot Config`CulDSac 0.014832 *
`Lot Config`FR2 0.151988
`Lot Config`FR3 0.097542 .
`Lot Config`Inside 0.124134
`Land Slope`Mod 0.052884 .
`Land Slope`Sev 0.0000010668776910 ***
NeighborhoodIDOTRR 0.865842
NeighborhoodBrDale 0.021309 *
NeighborhoodOldTown 0.685131
NeighborhoodBrkSide 0.099139 .
NeighborhoodEdwards 0.689908
NeighborhoodSWISU 0.667116
NeighborhoodSawyer 0.339962
NeighborhoodNPkVill 0.112720
NeighborhoodBlueste 0.073666 .
NeighborhoodNAmes 0.681601
NeighborhoodMitchel 0.400377
NeighborhoodSawyerW 0.933041
NeighborhoodNWAmes 0.843630
NeighborhoodGilbert 0.754739
NeighborhoodGreens 0.088266 .
NeighborhoodBlmngtn 0.085207 .
NeighborhoodCollgCr 0.981612
NeighborhoodCrawfor 0.007044 **
NeighborhoodClearCr 0.655914
NeighborhoodSomerst 0.016007 *
NeighborhoodTimber 0.458509
NeighborhoodVeenker 0.139353
NeighborhoodGrnHill 0.0000000005466222 ***
NeighborhoodNridgHt 0.003811 **
NeighborhoodStoneBr 0.0000001079749159 ***
NeighborhoodNoRidge 0.0000416735767580 ***
`Condition 1`Feedr 0.457127
`Condition 1`Norm 0.0000668211354976 ***
`Condition 1`PosA 0.170790
`Condition 1`PosN 0.000641 ***
`Condition 1`RRAe 0.725271
`Condition 1`RRAn 0.162972
`Condition 1`RRNe 0.081058 .
`Condition 1`RRNn 0.803761
`Condition 2`Feedr 0.541901
`Condition 2`Norm 0.154164
`Condition 2`PosA 0.006543 **
`Condition 2`PosN < 0.0000000000000002 ***
`Condition 2`RRAe 0.056263 .
`Condition 2`RRAn 0.306154
`Condition 2`RRNn 0.461153
`Bldg Type`2fmCon 0.874375
`Bldg Type`Duplex 0.000147 ***
`Bldg Type`Twnhs 0.039018 *
`Bldg Type`TwnhsE 0.205610
`House Style`1.5Fin 0.115506
`House Style`SFoyer 0.545309
`House Style`SLvl 0.322717
`House Style`2.5Unf 0.434318
`House Style`1Story 0.511183
`House Style`2Story 0.223974
`House Style`2.5Fin 0.007007 **
`Overall Qual` < 0.0000000000000002 ***
`Overall Cond` 0.0000000000002517 ***
`Year Built` 0.0000002739041745 ***
`Year Remod/Add` 0.001028 **
`Roof Style`Gable 0.845624
`Roof Style`Gambrel 0.894781
`Roof Style`Hip 0.774808
`Roof Style`Mansard 0.503974
`Roof Style`Shed 0.000458 ***
`Roof Matl`CompShg < 0.0000000000000002 ***
`Roof Matl`Membran < 0.0000000000000002 ***
`Roof Matl`Metal < 0.0000000000000002 ***
`Roof Matl`Roll < 0.0000000000000002 ***
`Roof Matl`Tar&Grv < 0.0000000000000002 ***
`Roof Matl`WdShake < 0.0000000000000002 ***
`Roof Matl`WdShngl < 0.0000000000000002 ***
`Exterior 1st`AsphShn 0.713749
`Exterior 1st`BrkComm 0.791836
`Exterior 1st`BrkFace 0.028987 *
`Exterior 1st`CBlock 0.654355
`Exterior 1st`CemntBd 0.222484
`Exterior 1st`HdBoard 0.835200
`Exterior 1st`ImStucc 0.781364
`Exterior 1st`MetalSd 0.569435
`Exterior 1st`Plywood 0.672464
`Exterior 1st`Stone 0.596123
`Exterior 1st`Stucco 0.466532
`Exterior 1st`VinylSd 0.838402
`Exterior 1st`Wd Sdng 0.931791
`Exterior 1st`WdShing 0.739774
`Exterior 2nd`AsphShn 0.981140
`Exterior 2nd`Brk Cmn 0.776164
`Exterior 2nd`BrkFace 0.403625
`Exterior 2nd`CBlock 0.583968
`Exterior 2nd`CmentBd 0.102092
`Exterior 2nd`HdBoard 0.586624
`Exterior 2nd`ImStucc 0.395097
`Exterior 2nd`MetalSd 0.728478
`Exterior 2nd`Plywood 0.596207
`Exterior 2nd`Stone 0.714689
`Exterior 2nd`Stucco 0.877225
`Exterior 2nd`VinylSd 0.860210
`Exterior 2nd`Wd Sdng 0.916313
`Exterior 2nd`Wd Shng 0.639062
`Mas Vnr Type`BrkCmn 0.068188 .
`Mas Vnr Type`BrkFace 0.063222 .
`Mas Vnr Type`CBlock NA
`Mas Vnr Type`None 0.385161
`Mas Vnr Type`Stone 0.521099
`Mas Vnr Area` 0.0000000000002036 ***
`Exter Qual`Fa 0.008792 **
`Exter Qual`Gd 0.0000000001556771 ***
`Exter Qual`TA 0.0000000184611421 ***
`Exter Cond`Fa 0.720777
`Exter Cond`Gd 0.242987
`Exter Cond`Po 0.481882
`Exter Cond`TA 0.303028
FoundationCBlock 0.984466
FoundationPConc 0.336709
FoundationSlab 0.350036
FoundationStone 0.243265
FoundationWood 0.932350
`Bsmt Qual`100 0.119308
`Bsmt Qual`Ex 0.794190
`Bsmt Qual`Fa 0.998222
`Bsmt Qual`Gd 0.885409
`Bsmt Qual`Po 0.436769
`Bsmt Qual`TA 0.906181
`Bsmt Cond`100 NA
`Bsmt Cond`Ex 0.491241
`Bsmt Cond`Fa 0.785296
`Bsmt Cond`Gd 0.087392 .
`Bsmt Cond`Po 0.288975
`Bsmt Cond`TA NA
`Bsmt Exposure`100 NA
`Bsmt Exposure`Av 0.298826
`Bsmt Exposure`Gd 0.128998
`Bsmt Exposure`Mn 0.540240
`Bsmt Exposure`No 0.508024
`BsmtFin Type 1`100 NA
`BsmtFin Type 1`ALQ 0.422960
`BsmtFin Type 1`BLQ 0.580791
`BsmtFin Type 1`GLQ 0.106566
`BsmtFin Type 1`LwQ 0.003796 **
`BsmtFin Type 1`Rec 0.020820 *
`BsmtFin Type 1`Unf NA
`BsmtFin SF 1` < 0.0000000000000002 ***
`BsmtFin Type 2`100 NA
`BsmtFin Type 2`ALQ 0.158349
`BsmtFin Type 2`BLQ 0.347863
`BsmtFin Type 2`GLQ 0.084491 .
`BsmtFin Type 2`LwQ 0.388258
`BsmtFin Type 2`Rec 0.396637
`BsmtFin Type 2`Unf 0.294595
`BsmtFin SF 2` 0.0000000013214230 ***
`Bsmt Unf SF` 0.0000000000004245 ***
`Total Bsmt SF` NA
HeatingGasA 0.680987
HeatingGasW 0.861468
HeatingGrav 0.869216
HeatingOthW 0.233860
HeatingWall 0.514677
`Heating QC`Fa 0.436035
`Heating QC`Gd 0.598627
`Heating QC`Po 0.481631
`Heating QC`TA 0.034382 *
`Central Air`Y 0.229807
ElectricalFuseA 0.316073
ElectricalFuseF 0.320964
ElectricalFuseP 0.323502
ElectricalMix 0.954502
ElectricalSBrkr 0.338823
`1st Flr SF` < 0.0000000000000002 ***
`2nd Flr SF` < 0.0000000000000002 ***
`Low Qual Fin SF` 0.000235 ***
`Gr Liv Area` NA
`Bsmt Full Bath` 0.481300
`Bsmt Half Bath` 0.300346
`Full Bath` 0.003404 **
`Half Bath` 0.258938
`Bedroom AbvGr` 0.0000001681236060 ***
`Kitchen AbvGr` 0.035839 *
`Kitchen Qual`Fa 0.0000037415657822 ***
`Kitchen Qual`Gd 0.0000000000000492 ***
`Kitchen Qual`Po 0.453119
`Kitchen Qual`TA 0.0000000000030367 ***
`TotRms AbvGrd` 0.262004
FunctionalMaj2 0.385884
FunctionalMin1 0.157109
FunctionalMin2 0.216074
FunctionalMod 0.826913
FunctionalSev 0.009500 **
FunctionalTyp 0.002184 **
Fireplaces 0.002488 **
`Fireplace Qu`Ex 0.751211
`Fireplace Qu`Fa 0.111357
`Fireplace Qu`Gd 0.052030 .
`Fireplace Qu`Po 0.918224
`Fireplace Qu`TA 0.046071 *
`Garage Type`2Types 0.024517 *
`Garage Type`Attchd 0.046674 *
`Garage Type`Basment 0.044351 *
`Garage Type`BuiltIn 0.088830 .
`Garage Type`CarPort 0.146054
`Garage Type`Detchd 0.068450 .
`Garage Yr Blt` 0.682939
`Garage Finish`100 NA
`Garage Finish`Fin 0.146288
`Garage Finish`RFn 0.179322
`Garage Finish`Unf 0.137181
`Garage Cars` 0.083697 .
`Garage Area` 0.000236 ***
`Garage Qual`100 NA
`Garage Qual`Ex 0.0000004886353514 ***
`Garage Qual`Fa 0.799137
`Garage Qual`Gd 0.007344 **
`Garage Qual`Po 0.109524
`Garage Qual`TA NA
`Garage Cond`100 NA
`Garage Cond`Ex 0.0000572213962438 ***
`Garage Cond`Fa 0.755852
`Garage Cond`Gd 0.423258
`Garage Cond`Po 0.304320
`Garage Cond`TA NA
`Paved Drive`P 0.053812 .
`Paved Drive`Y 0.686939
`Wood Deck SF` 0.210568
`Open Porch SF` 0.463661
`Enclosed Porch` 0.283160
`3Ssn Porch` 0.506416
`Screen Porch` 0.0000061973196891 ***
`Pool Area` 0.0000000000151917 ***
`Pool QC`Ex 0.0000000000002804 ***
`Pool QC`Fa 0.0000000000163759 ***
`Pool QC`Gd 0.0000000000093372 ***
`Pool QC`TA 0.0000000004347687 ***
FenceGdPrv 0.929369
FenceGdWo 0.296165
FenceMnPrv 0.148609
FenceMnWw 0.627362
`Misc Feature`Elev < 0.0000000000000002 ***
`Misc Feature`Gar2 0.993022
`Misc Feature`Othr 0.081384 .
`Misc Feature`Shed 0.727344
`Misc Val` 0.944858
`Mo Sold` 0.579972
`Yr Sold` 0.589263
`Sale Type`Con 0.000943 ***
`Sale Type`ConLD 0.572083
`Sale Type`ConLI 0.873999
`Sale Type`ConLw 0.317287
`Sale Type`CWD 0.061049 .
`Sale Type`New 0.087209 .
`Sale Type`Oth 0.553240
`Sale Type`VWD 0.826179
`Sale Type`WD 0.157770
`Sale Condition`AdjLand 0.007886 **
`Sale Condition`Alloca 0.000135 ***
`Sale Condition`Family 0.428979
`Sale Condition`Normal 0.002035 **
`Sale Condition`Partial 0.518688
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19040 on 1788 degrees of freedom
Multiple R-squared: 0.9493, Adjusted R-squared: 0.9417
F-statistic: 126.2 on 265 and 1788 DF, p-value: < 0.00000000000000022
Can I use AIC to determine which variables I need to use in my linear regression model?
“If it weren’t for my lawyer, I’d still be in prison. It went a lot faster with two people digging.”
lambda <- 10^seq(-3, 3, length = 100)
lassoFit <-
train(
SalePrice ~ .,
data = train,
method = "glmnet",
trControl = ctrl,
preProcess = c("center", "scale"),
tuneGrid = expand.grid(alpha = 1, lambda = lambda))Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Membran, `Roof Matl`Metal,
`Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, ElectricalMix, FunctionalSal, FunctionalSev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`RRAe, `Exterior
1st`AsphShn, `Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, `Exter Cond`Po, `Bsmt Qual`Po, FunctionalSal, `Misc
Feature`Elev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAn, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr Type`CBlock, FunctionalSal, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, HeatingOthW, FunctionalSal, `Pool QC`Fa, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Condition 2`PosN, `Roof Matl`Roll,
`Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Kitchen
Qual`Po, FunctionalSal, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr Type`CBlock, HeatingOthW,
FunctionalSal, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, FunctionalSal, `Pool QC`Fa, `Misc
Feature`Elev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Membran,
`Exterior 1st`AsphShn, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, ElectricalMix, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Roof Matl`Metal, `Exterior 1st`PreCast, `Exterior
1st`Stone, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Exter Cond`Po, `Bsmt
Qual`Po, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`RRAe, `Condition 2`RRAn,
`Roof Matl`Roll, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Kitchen Qual`Po, FunctionalSal, FunctionalSev, `Misc
Feature`Gar2, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAe, `Roof Style`Shed, `Roof Matl`Metal,
`Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Mas Vnr Type`CBlock, FunctionalSal, `Pool QC`Fa, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, NeighborhoodGrnHill, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Exter Cond`Po,
FunctionalSal, `Misc Feature`Elev, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAn, `Roof Matl`Membran, `Roof Matl`Roll,
`Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Bsmt
Qual`Po, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRNn, `Exterior
1st`AsphShn, `Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, FunctionalSal, FunctionalSev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, HeatingOthW, ElectricalMix, `Kitchen Qual`Po, FunctionalSal, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, FunctionalSal, `Misc Feature`TenC
lassoFit # to obtain summary of the modelglmnet
2054 samples
81 predictor
Pre-processing: centered (287), scaled (287)
Resampling: Cross-Validated (5 fold, repeated 3 times)
Summary of sample sizes: 1644, 1643, 1644, 1642, 1643, 1643, ...
Resampling results across tuning parameters:
lambda RMSE Rsquared MAE
0.001000000 42040.76 0.7599078 17175.04
0.001149757 42040.76 0.7599078 17175.04
0.001321941 42040.76 0.7599078 17175.04
0.001519911 42040.76 0.7599078 17175.04
0.001747528 42040.76 0.7599078 17175.04
0.002009233 42040.76 0.7599078 17175.04
0.002310130 42040.76 0.7599078 17175.04
0.002656088 42040.76 0.7599078 17175.04
0.003053856 42040.76 0.7599078 17175.04
0.003511192 42040.76 0.7599078 17175.04
0.004037017 42040.76 0.7599078 17175.04
0.004641589 42040.76 0.7599078 17175.04
0.005336699 42040.76 0.7599078 17175.04
0.006135907 42040.76 0.7599078 17175.04
0.007054802 42040.76 0.7599078 17175.04
0.008111308 42040.76 0.7599078 17175.04
0.009326033 42040.76 0.7599078 17175.04
0.010722672 42040.76 0.7599078 17175.04
0.012328467 42040.76 0.7599078 17175.04
0.014174742 42040.76 0.7599078 17175.04
0.016297508 42040.76 0.7599078 17175.04
0.018738174 42040.76 0.7599078 17175.04
0.021544347 42040.76 0.7599078 17175.04
0.024770764 42040.76 0.7599078 17175.04
0.028480359 42040.76 0.7599078 17175.04
0.032745492 42040.76 0.7599078 17175.04
0.037649358 42040.76 0.7599078 17175.04
0.043287613 42040.76 0.7599078 17175.04
0.049770236 42040.76 0.7599078 17175.04
0.057223677 42040.76 0.7599078 17175.04
0.065793322 42040.76 0.7599078 17175.04
0.075646333 42040.76 0.7599078 17175.04
0.086974900 42040.76 0.7599078 17175.04
0.100000000 42040.76 0.7599078 17175.04
0.114975700 42040.76 0.7599078 17175.04
0.132194115 42040.76 0.7599078 17175.04
0.151991108 42040.76 0.7599078 17175.04
0.174752840 42040.76 0.7599078 17175.04
0.200923300 42040.76 0.7599078 17175.04
0.231012970 42040.76 0.7599078 17175.04
0.265608778 42040.76 0.7599078 17175.04
0.305385551 42040.76 0.7599078 17175.04
0.351119173 42040.76 0.7599078 17175.04
0.403701726 42040.76 0.7599078 17175.04
0.464158883 42040.76 0.7599078 17175.04
0.533669923 42040.76 0.7599078 17175.04
0.613590727 42040.76 0.7599078 17175.04
0.705480231 42040.76 0.7599078 17175.04
0.811130831 42040.76 0.7599078 17175.04
0.932603347 42040.76 0.7599078 17175.04
1.072267222 42040.76 0.7599078 17175.04
1.232846739 42040.76 0.7599078 17175.04
1.417474163 42040.76 0.7599078 17175.04
1.629750835 42040.76 0.7599078 17175.04
1.873817423 42040.76 0.7599078 17175.04
2.154434690 42040.76 0.7599078 17175.04
2.477076356 42040.76 0.7599078 17175.04
2.848035868 42040.76 0.7599078 17175.04
3.274549163 42040.76 0.7599078 17175.04
3.764935807 42040.76 0.7599078 17175.04
4.328761281 42040.76 0.7599078 17175.04
4.977023564 42040.76 0.7599078 17175.04
5.722367659 42040.76 0.7599078 17175.04
6.579332247 42023.89 0.7600321 17173.61
7.564633276 41951.23 0.7606074 17166.22
8.697490026 41878.76 0.7611924 17156.93
10.000000000 41799.00 0.7618452 17144.98
11.497569954 41712.03 0.7625552 17130.70
13.219411485 41611.38 0.7633766 17113.74
15.199110830 41501.39 0.7642807 17095.03
17.475284000 41374.13 0.7653240 17074.52
20.092330026 41211.27 0.7666425 17048.57
23.101297001 41008.27 0.7682707 17018.70
26.560877829 40793.52 0.7699940 16988.01
30.538555088 40559.24 0.7718856 16953.81
35.111917342 40300.79 0.7739868 16913.57
40.370172586 40019.69 0.7762760 16868.23
46.415888336 39721.79 0.7786957 16823.81
53.366992312 39409.51 0.7812113 16779.75
61.359072734 39076.49 0.7838829 16730.84
70.548023107 38691.25 0.7869840 16676.99
81.113083079 38255.95 0.7905019 16620.05
93.260334688 37860.79 0.7936179 16564.54
107.226722201 37402.78 0.7972634 16509.88
123.284673944 36861.19 0.8016515 16449.64
141.747416293 36222.75 0.8068904 16379.65
162.975083462 35527.48 0.8126305 16309.50
187.381742286 34776.94 0.8188339 16241.99
215.443469003 33983.82 0.8253850 16178.59
247.707635599 33257.03 0.8313787 16134.37
284.803586844 32526.91 0.8373352 16106.50
327.454916288 31830.16 0.8429469 16098.60
376.493580679 31207.42 0.8479235 16110.69
432.876128108 30789.44 0.8511658 16141.42
497.702356433 30640.48 0.8521846 16177.97
572.236765935 30587.69 0.8524475 16229.34
657.933224658 30559.38 0.8525005 16311.31
756.463327555 30547.59 0.8524189 16388.31
869.749002618 30541.41 0.8522876 16460.19
1000.000000000 30536.34 0.8521575 16545.33
Tuning parameter 'alpha' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 1000.
varImp(lassoFit) # to see most important parametersglmnet variable importance
only 20 most important variables shown (out of 287)
Overall
`Gr Liv Area` 100.00
`Overall Qual` 64.67
`Misc Feature`Elev 35.36
`Bsmt Qual`Ex 34.89
`Condition 2`PosN 27.94
NeighborhoodNridgHt 27.35
`MS SubClass` 25.70
`Bsmt Exposure`Gd 20.69
NeighborhoodStoneBr 20.39
NeighborhoodNoRidge 19.84
`Sale Type`New 18.82
`Pool QC`Gd 18.47
`Year Built` 18.28
`Mas Vnr Area` 17.13
`BsmtFin SF 1` 16.79
`Total Bsmt SF` 16.08
`Garage Cars` 15.42
`Overall Cond` 14.57
`Lot Area` 13.28
Fireplaces 12.61
plot(varImp(lassoFit)) # to plot most important parameters## Run kNN
knnFit <-
train(
SalePrice ~ .,
data = train,
method = "knn",
trControl = ctrl,
preProcess = c("center", "scale")
)Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Exterior
1st`AsphShn, `Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Exterior
1st`AsphShn, `Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Exterior
1st`AsphShn, `Exterior 1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, `Bsmt
Qual`Po, FunctionalSal, FunctionalSev, `Misc Feature`Elev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, `Bsmt
Qual`Po, FunctionalSal, FunctionalSev, `Misc Feature`Elev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior
2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, `Bsmt
Qual`Po, FunctionalSal, FunctionalSev, `Misc Feature`Elev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Roof Matl`Membran, `Roof Matl`Roll, `Exterior 1st`CBlock,
`Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Heating QC`Po, ElectricalMix, FunctionalSal, `Pool QC`Fa, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Roof Matl`Membran, `Roof Matl`Roll, `Exterior 1st`CBlock,
`Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Heating QC`Po, ElectricalMix, FunctionalSal, `Pool QC`Fa, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Roof Matl`Membran, `Roof Matl`Roll, `Exterior 1st`CBlock,
`Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Heating QC`Po, ElectricalMix, FunctionalSal, `Pool QC`Fa, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`RRAn, `Condition 2`RRNn,
`Roof Matl`Metal, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, HeatingOthW, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`TenC,
`Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`RRAn, `Condition 2`RRNn,
`Roof Matl`Metal, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, HeatingOthW, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`TenC,
`Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`RRAn, `Condition 2`RRNn,
`Roof Matl`Metal, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, HeatingOthW, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`TenC,
`Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Exter Cond`Po, `Bsmt Qual`Po, `Kitchen Qual`Po, FunctionalSal,
`Pool QC`Fa, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Exter Cond`Po, `Bsmt Qual`Po, `Kitchen Qual`Po, FunctionalSal,
`Pool QC`Fa, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Exter Cond`Po, `Bsmt Qual`Po, `Kitchen Qual`Po, FunctionalSal,
`Pool QC`Fa, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Condition 2`PosA, `Exterior 1st`PreCast,
`Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`Elev,
`Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Condition 2`PosA, `Exterior 1st`PreCast,
`Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`Elev,
`Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Condition 2`PosA, `Exterior 1st`PreCast,
`Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal, `Misc Feature`Elev,
`Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`PosN, `Roof
Matl`Membran, `Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, ElectricalMix, FunctionalSal, FunctionalSev, `Pool
QC`TA, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`PosN, `Roof
Matl`Membran, `Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, ElectricalMix, FunctionalSal, FunctionalSev, `Pool
QC`TA, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Condition 2`PosN, `Roof
Matl`Membran, `Exterior 1st`PreCast, `Exterior 1st`Stone, `Exterior 2nd`Other,
`Exterior 2nd`PreCast, ElectricalMix, FunctionalSal, FunctionalSev, `Pool
QC`TA, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Roof Matl`Roll,
`Exterior 1st`AsphShn, `Exterior 1st`CBlock, `Exterior 1st`ImStucc, `Exterior
1st`PreCast, `Exterior 2nd`CBlock, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Roof Matl`Roll,
`Exterior 1st`AsphShn, `Exterior 1st`CBlock, `Exterior 1st`ImStucc, `Exterior
1st`PreCast, `Exterior 2nd`CBlock, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Roof Matl`Roll,
`Exterior 1st`AsphShn, `Exterior 1st`CBlock, `Exterior 1st`ImStucc, `Exterior
1st`PreCast, `Exterior 2nd`CBlock, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAe, `Condition 2`RRAn, `Roof Style`Shed,
`Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr
Type`CBlock, HeatingOthW, FunctionalSal, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAe, `Condition 2`RRAn, `Roof Style`Shed,
`Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr
Type`CBlock, HeatingOthW, FunctionalSal, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAe, `Condition 2`RRAn, `Roof Style`Shed,
`Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Mas Vnr
Type`CBlock, HeatingOthW, FunctionalSal, `Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Roof Style`Shed,
`Roof Matl`Membran, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, FunctionalSal, `Pool QC`Fa,
`Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Roof Style`Shed,
`Roof Matl`Membran, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, FunctionalSal, `Pool QC`Fa,
`Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`PosN, `Condition 2`RRAe, `Roof Style`Shed,
`Roof Matl`Membran, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, `Mas Vnr Type`CBlock, `Exter Cond`Po, FunctionalSal, `Pool QC`Fa,
`Misc Feature`TenC, `Sale Type`VWD
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Roof Matl`Roll, `Exterior 1st`PreCast,
`Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal,
FunctionalSev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Roof Matl`Roll, `Exterior 1st`PreCast,
`Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal,
FunctionalSev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSeWa,
UtilitiesNoSewr, NeighborhoodLandmrk, `Roof Matl`Roll, `Exterior 1st`PreCast,
`Exterior 1st`Stone, `Exterior 2nd`Other, `Exterior 2nd`PreCast, FunctionalSal,
FunctionalSev, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAn, `Exterior 1st`AsphShn, `Exterior
1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAn, `Exterior 1st`AsphShn, `Exterior
1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRAn, `Exterior 1st`AsphShn, `Exterior
1st`ImStucc, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Bsmt Qual`Po,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Bsmt Qual`Po,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Condition 2`RRNn, `Roof Matl`Metal, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, `Bsmt Qual`Po,
FunctionalSal, `Misc Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Exterior 1st`CBlock, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, HeatingOthW,
ElectricalMix, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`Elev, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Exterior 1st`CBlock, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, HeatingOthW,
ElectricalMix, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`Elev, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, NeighborhoodGrnHill, `Exterior 1st`CBlock, `Exterior
1st`PreCast, `Exterior 2nd`Other, `Exterior 2nd`PreCast, HeatingOthW,
ElectricalMix, `Kitchen Qual`Po, FunctionalSal, `Misc Feature`Elev, `Misc
Feature`TenC
Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
10, : These variables have zero variances: `MS Zoning`I (all), UtilitiesNoSewr,
NeighborhoodLandmrk, `Exterior 1st`PreCast, `Exterior 2nd`Other, `Exterior
2nd`PreCast, FunctionalSal, `Misc Feature`TenC
knnFit # to obtain summary of the modelk-Nearest Neighbors
2054 samples
81 predictor
Pre-processing: centered (287), scaled (287)
Resampling: Cross-Validated (5 fold, repeated 3 times)
Summary of sample sizes: 1643, 1643, 1643, 1644, 1643, 1643, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 38930.44 0.7633914 25133.32
7 38286.61 0.7741358 24648.10
9 38245.24 0.7758625 24418.59
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 9.
plot(knnFit)varImp(knnFit) # to see most important parametersloess r-squared variable importance
only 20 most important variables shown (out of 81)
Overall
Overall Qual 100.00
Neighborhood 80.72
Gr Liv Area 80.08
Total Bsmt SF 74.98
Garage Area 70.58
1st Flr SF 68.65
Garage Cars 66.43
Exter Qual 66.17
Kitchen Qual 56.39
Year Built 50.35
Full Bath 47.58
Year Remod/Add 44.92
BsmtFin SF 1 42.03
Garage Yr Blt 41.42
Mas Vnr Area 41.04
TotRms AbvGrd 39.77
Bsmt Qual 35.03
Fireplaces 34.42
2nd Flr SF 31.70
PID 30.44
plot(varImp(knnFit)) # to plot most important parametersThe performance metric for the prediction model should be the Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sale price. This makes it the Root-Mean-Squared-Log-Error (RMSLE). By plotting a histogram of the sale price you will understand why the logarithm is recommended.
# Make predictions on the test data
predictions <- predict(model, newdata = test)
# Calculate evaluation metrics (e.g., RMSE)
rmse <- caret::RMSE(predictions, test$SalePrice)# LASSO
pred_lassoFit <-
predict(lassoFit, newdata = test)
lasso_rmse <-
rmse(
actual = test$SalePrice,
predicted = pred_lassoFit
) %>%
round(3)
# KNN
pred_knn <-
predict(knnFit, newdata = test)
knn_rmse <-
rmse(
actual = test$SalePrice,
predicted = pred_knn
) %>%
round(3)
data.table(
Model = c("Lasso" , "KNN"),
RMSE = c(lasso_rmse, knn_rmse)
) %T>%
setorder(RMSE) %>%
.[, .(Rank= 1:.N, Model, RMSE)] %>%
kbl(
caption = "Model performance",
align = 'l',
centering = F
) %>%
kable_styling(
full_width = FALSE,
position = "left",
htmltable_class = "lighttable-hover lighttable-condensed lightable-striped"
) %>%
## Appendixdesc <-
paste0(github_ames, "data_description.txt") %>%
readLines()
print(desc) [1] "MSSubClass: Identifies the type of dwelling involved in the sale.\t"
[2] ""
[3] " 20\t1-STORY 1946 & NEWER ALL STYLES"
[4] " 30\t1-STORY 1945 & OLDER"
[5] " 40\t1-STORY W/FINISHED ATTIC ALL AGES"
[6] " 45\t1-1/2 STORY - UNFINISHED ALL AGES"
[7] " 50\t1-1/2 STORY FINISHED ALL AGES"
[8] " 60\t2-STORY 1946 & NEWER"
[9] " 70\t2-STORY 1945 & OLDER"
[10] " 75\t2-1/2 STORY ALL AGES"
[11] " 80\tSPLIT OR MULTI-LEVEL"
[12] " 85\tSPLIT FOYER"
[13] " 90\tDUPLEX - ALL STYLES AND AGES"
[14] " 120\t1-STORY PUD (Planned Unit Development) - 1946 & NEWER"
[15] " 150\t1-1/2 STORY PUD - ALL AGES"
[16] " 160\t2-STORY PUD - 1946 & NEWER"
[17] " 180\tPUD - MULTILEVEL - INCL SPLIT LEV/FOYER"
[18] " 190\t2 FAMILY CONVERSION - ALL STYLES AND AGES"
[19] ""
[20] "MSZoning: Identifies the general zoning classification of the sale."
[21] "\t\t"
[22] " A\tAgriculture"
[23] " C\tCommercial"
[24] " FV\tFloating Village Residential"
[25] " I\tIndustrial"
[26] " RH\tResidential High Density"
[27] " RL\tResidential Low Density"
[28] " RP\tResidential Low Density Park "
[29] " RM\tResidential Medium Density"
[30] "\t"
[31] "LotFrontage: Linear feet of street connected to property"
[32] ""
[33] "LotArea: Lot size in square feet"
[34] ""
[35] "Street: Type of road access to property"
[36] ""
[37] " Grvl\tGravel\t"
[38] " Pave\tPaved"
[39] " \t"
[40] "Alley: Type of alley access to property"
[41] ""
[42] " Grvl\tGravel"
[43] " Pave\tPaved"
[44] " NA \tNo alley access"
[45] "\t\t"
[46] "LotShape: General shape of property"
[47] ""
[48] " Reg\tRegular\t"
[49] " IR1\tSlightly irregular"
[50] " IR2\tModerately Irregular"
[51] " IR3\tIrregular"
[52] " "
[53] "LandContour: Flatness of the property"
[54] ""
[55] " Lvl\tNear Flat/Level\t"
[56] " Bnk\tBanked - Quick and significant rise from street grade to building"
[57] " HLS\tHillside - Significant slope from side to side"
[58] " Low\tDepression"
[59] "\t\t"
[60] "Utilities: Type of utilities available"
[61] "\t\t"
[62] " AllPub\tAll public Utilities (E,G,W,& S)\t"
[63] " NoSewr\tElectricity, Gas, and Water (Septic Tank)"
[64] " NoSeWa\tElectricity and Gas Only"
[65] " ELO\tElectricity only\t"
[66] "\t"
[67] "LotConfig: Lot configuration"
[68] ""
[69] " Inside\tInside lot"
[70] " Corner\tCorner lot"
[71] " CulDSac\tCul-de-sac"
[72] " FR2\tFrontage on 2 sides of property"
[73] " FR3\tFrontage on 3 sides of property"
[74] "\t"
[75] "LandSlope: Slope of property"
[76] "\t\t"
[77] " Gtl\tGentle slope"
[78] " Mod\tModerate Slope\t"
[79] " Sev\tSevere Slope"
[80] "\t"
[81] "Neighborhood: Physical locations within Ames city limits"
[82] ""
[83] " Blmngtn\tBloomington Heights"
[84] " Blueste\tBluestem"
[85] " BrDale\tBriardale"
[86] " BrkSide\tBrookside"
[87] " ClearCr\tClear Creek"
[88] " CollgCr\tCollege Creek"
[89] " Crawfor\tCrawford"
[90] " Edwards\tEdwards"
[91] " Gilbert\tGilbert"
[92] " IDOTRR\tIowa DOT and Rail Road"
[93] " MeadowV\tMeadow Village"
[94] " Mitchel\tMitchell"
[95] " Names\tNorth Ames"
[96] " NoRidge\tNorthridge"
[97] " NPkVill\tNorthpark Villa"
[98] " NridgHt\tNorthridge Heights"
[99] " NWAmes\tNorthwest Ames"
[100] " OldTown\tOld Town"
[101] " SWISU\tSouth & West of Iowa State University"
[102] " Sawyer\tSawyer"
[103] " SawyerW\tSawyer West"
[104] " Somerst\tSomerset"
[105] " StoneBr\tStone Brook"
[106] " Timber\tTimberland"
[107] " Veenker\tVeenker"
[108] "\t\t\t"
[109] "Condition1: Proximity to various conditions"
[110] "\t"
[111] " Artery\tAdjacent to arterial street"
[112] " Feedr\tAdjacent to feeder street\t"
[113] " Norm\tNormal\t"
[114] " RRNn\tWithin 200' of North-South Railroad"
[115] " RRAn\tAdjacent to North-South Railroad"
[116] " PosN\tNear positive off-site feature--park, greenbelt, etc."
[117] " PosA\tAdjacent to postive off-site feature"
[118] " RRNe\tWithin 200' of East-West Railroad"
[119] " RRAe\tAdjacent to East-West Railroad"
[120] "\t"
[121] "Condition2: Proximity to various conditions (if more than one is present)"
[122] "\t\t"
[123] " Artery\tAdjacent to arterial street"
[124] " Feedr\tAdjacent to feeder street\t"
[125] " Norm\tNormal\t"
[126] " RRNn\tWithin 200' of North-South Railroad"
[127] " RRAn\tAdjacent to North-South Railroad"
[128] " PosN\tNear positive off-site feature--park, greenbelt, etc."
[129] " PosA\tAdjacent to postive off-site feature"
[130] " RRNe\tWithin 200' of East-West Railroad"
[131] " RRAe\tAdjacent to East-West Railroad"
[132] "\t"
[133] "BldgType: Type of dwelling"
[134] "\t\t"
[135] " 1Fam\tSingle-family Detached\t"
[136] " 2FmCon\tTwo-family Conversion; originally built as one-family dwelling"
[137] " Duplx\tDuplex"
[138] " TwnhsE\tTownhouse End Unit"
[139] " TwnhsI\tTownhouse Inside Unit"
[140] "\t"
[141] "HouseStyle: Style of dwelling"
[142] "\t"
[143] " 1Story\tOne story"
[144] " 1.5Fin\tOne and one-half story: 2nd level finished"
[145] " 1.5Unf\tOne and one-half story: 2nd level unfinished"
[146] " 2Story\tTwo story"
[147] " 2.5Fin\tTwo and one-half story: 2nd level finished"
[148] " 2.5Unf\tTwo and one-half story: 2nd level unfinished"
[149] " SFoyer\tSplit Foyer"
[150] " SLvl\tSplit Level"
[151] "\t"
[152] "OverallQual: Rates the overall material and finish of the house"
[153] ""
[154] " 10\tVery Excellent"
[155] " 9\tExcellent"
[156] " 8\tVery Good"
[157] " 7\tGood"
[158] " 6\tAbove Average"
[159] " 5\tAverage"
[160] " 4\tBelow Average"
[161] " 3\tFair"
[162] " 2\tPoor"
[163] " 1\tVery Poor"
[164] "\t"
[165] "OverallCond: Rates the overall condition of the house"
[166] ""
[167] " 10\tVery Excellent"
[168] " 9\tExcellent"
[169] " 8\tVery Good"
[170] " 7\tGood"
[171] " 6\tAbove Average\t"
[172] " 5\tAverage"
[173] " 4\tBelow Average\t"
[174] " 3\tFair"
[175] " 2\tPoor"
[176] " 1\tVery Poor"
[177] "\t\t"
[178] "YearBuilt: Original construction date"
[179] ""
[180] "YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)"
[181] ""
[182] "RoofStyle: Type of roof"
[183] ""
[184] " Flat\tFlat"
[185] " Gable\tGable"
[186] " Gambrel\tGabrel (Barn)"
[187] " Hip\tHip"
[188] " Mansard\tMansard"
[189] " Shed\tShed"
[190] "\t\t"
[191] "RoofMatl: Roof material"
[192] ""
[193] " ClyTile\tClay or Tile"
[194] " CompShg\tStandard (Composite) Shingle"
[195] " Membran\tMembrane"
[196] " Metal\tMetal"
[197] " Roll\tRoll"
[198] " Tar&Grv\tGravel & Tar"
[199] " WdShake\tWood Shakes"
[200] " WdShngl\tWood Shingles"
[201] "\t\t"
[202] "Exterior1st: Exterior covering on house"
[203] ""
[204] " AsbShng\tAsbestos Shingles"
[205] " AsphShn\tAsphalt Shingles"
[206] " BrkComm\tBrick Common"
[207] " BrkFace\tBrick Face"
[208] " CBlock\tCinder Block"
[209] " CemntBd\tCement Board"
[210] " HdBoard\tHard Board"
[211] " ImStucc\tImitation Stucco"
[212] " MetalSd\tMetal Siding"
[213] " Other\tOther"
[214] " Plywood\tPlywood"
[215] " PreCast\tPreCast\t"
[216] " Stone\tStone"
[217] " Stucco\tStucco"
[218] " VinylSd\tVinyl Siding"
[219] " Wd Sdng\tWood Siding"
[220] " WdShing\tWood Shingles"
[221] "\t"
[222] "Exterior2nd: Exterior covering on house (if more than one material)"
[223] ""
[224] " AsbShng\tAsbestos Shingles"
[225] " AsphShn\tAsphalt Shingles"
[226] " BrkComm\tBrick Common"
[227] " BrkFace\tBrick Face"
[228] " CBlock\tCinder Block"
[229] " CemntBd\tCement Board"
[230] " HdBoard\tHard Board"
[231] " ImStucc\tImitation Stucco"
[232] " MetalSd\tMetal Siding"
[233] " Other\tOther"
[234] " Plywood\tPlywood"
[235] " PreCast\tPreCast"
[236] " Stone\tStone"
[237] " Stucco\tStucco"
[238] " VinylSd\tVinyl Siding"
[239] " Wd Sdng\tWood Siding"
[240] " WdShing\tWood Shingles"
[241] "\t"
[242] "MasVnrType: Masonry veneer type"
[243] ""
[244] " BrkCmn\tBrick Common"
[245] " BrkFace\tBrick Face"
[246] " CBlock\tCinder Block"
[247] " None\tNone"
[248] " Stone\tStone"
[249] "\t"
[250] "MasVnrArea: Masonry veneer area in square feet"
[251] ""
[252] "ExterQual: Evaluates the quality of the material on the exterior "
[253] "\t\t"
[254] " Ex\tExcellent"
[255] " Gd\tGood"
[256] " TA\tAverage/Typical"
[257] " Fa\tFair"
[258] " Po\tPoor"
[259] "\t\t"
[260] "ExterCond: Evaluates the present condition of the material on the exterior"
[261] "\t\t"
[262] " Ex\tExcellent"
[263] " Gd\tGood"
[264] " TA\tAverage/Typical"
[265] " Fa\tFair"
[266] " Po\tPoor"
[267] "\t\t"
[268] "Foundation: Type of foundation"
[269] "\t\t"
[270] " BrkTil\tBrick & Tile"
[271] " CBlock\tCinder Block"
[272] " PConc\tPoured Contrete\t"
[273] " Slab\tSlab"
[274] " Stone\tStone"
[275] " Wood\tWood"
[276] "\t\t"
[277] "BsmtQual: Evaluates the height of the basement"
[278] ""
[279] " Ex\tExcellent (100+ inches)\t"
[280] " Gd\tGood (90-99 inches)"
[281] " TA\tTypical (80-89 inches)"
[282] " Fa\tFair (70-79 inches)"
[283] " Po\tPoor (<70 inches"
[284] " NA\tNo Basement"
[285] "\t\t"
[286] "BsmtCond: Evaluates the general condition of the basement"
[287] ""
[288] " Ex\tExcellent"
[289] " Gd\tGood"
[290] " TA\tTypical - slight dampness allowed"
[291] " Fa\tFair - dampness or some cracking or settling"
[292] " Po\tPoor - Severe cracking, settling, or wetness"
[293] " NA\tNo Basement"
[294] "\t"
[295] "BsmtExposure: Refers to walkout or garden level walls"
[296] ""
[297] " Gd\tGood Exposure"
[298] " Av\tAverage Exposure (split levels or foyers typically score average or above)\t"
[299] " Mn\tMimimum Exposure"
[300] " No\tNo Exposure"
[301] " NA\tNo Basement"
[302] "\t"
[303] "BsmtFinType1: Rating of basement finished area"
[304] ""
[305] " GLQ\tGood Living Quarters"
[306] " ALQ\tAverage Living Quarters"
[307] " BLQ\tBelow Average Living Quarters\t"
[308] " Rec\tAverage Rec Room"
[309] " LwQ\tLow Quality"
[310] " Unf\tUnfinshed"
[311] " NA\tNo Basement"
[312] "\t\t"
[313] "BsmtFinSF1: Type 1 finished square feet"
[314] ""
[315] "BsmtFinType2: Rating of basement finished area (if multiple types)"
[316] ""
[317] " GLQ\tGood Living Quarters"
[318] " ALQ\tAverage Living Quarters"
[319] " BLQ\tBelow Average Living Quarters\t"
[320] " Rec\tAverage Rec Room"
[321] " LwQ\tLow Quality"
[322] " Unf\tUnfinshed"
[323] " NA\tNo Basement"
[324] ""
[325] "BsmtFinSF2: Type 2 finished square feet"
[326] ""
[327] "BsmtUnfSF: Unfinished square feet of basement area"
[328] ""
[329] "TotalBsmtSF: Total square feet of basement area"
[330] ""
[331] "Heating: Type of heating"
[332] "\t\t"
[333] " Floor\tFloor Furnace"
[334] " GasA\tGas forced warm air furnace"
[335] " GasW\tGas hot water or steam heat"
[336] " Grav\tGravity furnace\t"
[337] " OthW\tHot water or steam heat other than gas"
[338] " Wall\tWall furnace"
[339] "\t\t"
[340] "HeatingQC: Heating quality and condition"
[341] ""
[342] " Ex\tExcellent"
[343] " Gd\tGood"
[344] " TA\tAverage/Typical"
[345] " Fa\tFair"
[346] " Po\tPoor"
[347] "\t\t"
[348] "CentralAir: Central air conditioning"
[349] ""
[350] " N\tNo"
[351] " Y\tYes"
[352] "\t\t"
[353] "Electrical: Electrical system"
[354] ""
[355] " SBrkr\tStandard Circuit Breakers & Romex"
[356] " FuseA\tFuse Box over 60 AMP and all Romex wiring (Average)\t"
[357] " FuseF\t60 AMP Fuse Box and mostly Romex wiring (Fair)"
[358] " FuseP\t60 AMP Fuse Box and mostly knob & tube wiring (poor)"
[359] " Mix\tMixed"
[360] "\t\t"
[361] "1stFlrSF: First Floor square feet"
[362] " "
[363] "2ndFlrSF: Second floor square feet"
[364] ""
[365] "LowQualFinSF: Low quality finished square feet (all floors)"
[366] ""
[367] "GrLivArea: Above grade (ground) living area square feet"
[368] ""
[369] "BsmtFullBath: Basement full bathrooms"
[370] ""
[371] "BsmtHalfBath: Basement half bathrooms"
[372] ""
[373] "FullBath: Full bathrooms above grade"
[374] ""
[375] "HalfBath: Half baths above grade"
[376] ""
[377] "Bedroom: Bedrooms above grade (does NOT include basement bedrooms)"
[378] ""
[379] "Kitchen: Kitchens above grade"
[380] ""
[381] "KitchenQual: Kitchen quality"
[382] ""
[383] " Ex\tExcellent"
[384] " Gd\tGood"
[385] " TA\tTypical/Average"
[386] " Fa\tFair"
[387] " Po\tPoor"
[388] " \t"
[389] "TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)"
[390] ""
[391] "Functional: Home functionality (Assume typical unless deductions are warranted)"
[392] ""
[393] " Typ\tTypical Functionality"
[394] " Min1\tMinor Deductions 1"
[395] " Min2\tMinor Deductions 2"
[396] " Mod\tModerate Deductions"
[397] " Maj1\tMajor Deductions 1"
[398] " Maj2\tMajor Deductions 2"
[399] " Sev\tSeverely Damaged"
[400] " Sal\tSalvage only"
[401] "\t\t"
[402] "Fireplaces: Number of fireplaces"
[403] ""
[404] "FireplaceQu: Fireplace quality"
[405] ""
[406] " Ex\tExcellent - Exceptional Masonry Fireplace"
[407] " Gd\tGood - Masonry Fireplace in main level"
[408] " TA\tAverage - Prefabricated Fireplace in main living area or Masonry Fireplace in basement"
[409] " Fa\tFair - Prefabricated Fireplace in basement"
[410] " Po\tPoor - Ben Franklin Stove"
[411] " NA\tNo Fireplace"
[412] "\t\t"
[413] "GarageType: Garage location"
[414] "\t\t"
[415] " 2Types\tMore than one type of garage"
[416] " Attchd\tAttached to home"
[417] " Basment\tBasement Garage"
[418] " BuiltIn\tBuilt-In (Garage part of house - typically has room above garage)"
[419] " CarPort\tCar Port"
[420] " Detchd\tDetached from home"
[421] " NA\tNo Garage"
[422] "\t\t"
[423] "GarageYrBlt: Year garage was built"
[424] "\t\t"
[425] "GarageFinish: Interior finish of the garage"
[426] ""
[427] " Fin\tFinished"
[428] " RFn\tRough Finished\t"
[429] " Unf\tUnfinished"
[430] " NA\tNo Garage"
[431] "\t\t"
[432] "GarageCars: Size of garage in car capacity"
[433] ""
[434] "GarageArea: Size of garage in square feet"
[435] ""
[436] "GarageQual: Garage quality"
[437] ""
[438] " Ex\tExcellent"
[439] " Gd\tGood"
[440] " TA\tTypical/Average"
[441] " Fa\tFair"
[442] " Po\tPoor"
[443] " NA\tNo Garage"
[444] "\t\t"
[445] "GarageCond: Garage condition"
[446] ""
[447] " Ex\tExcellent"
[448] " Gd\tGood"
[449] " TA\tTypical/Average"
[450] " Fa\tFair"
[451] " Po\tPoor"
[452] " NA\tNo Garage"
[453] "\t\t"
[454] "PavedDrive: Paved driveway"
[455] ""
[456] " Y\tPaved "
[457] " P\tPartial Pavement"
[458] " N\tDirt/Gravel"
[459] "\t\t"
[460] "WoodDeckSF: Wood deck area in square feet"
[461] ""
[462] "OpenPorchSF: Open porch area in square feet"
[463] ""
[464] "EnclosedPorch: Enclosed porch area in square feet"
[465] ""
[466] "3SsnPorch: Three season porch area in square feet"
[467] ""
[468] "ScreenPorch: Screen porch area in square feet"
[469] ""
[470] "PoolArea: Pool area in square feet"
[471] ""
[472] "PoolQC: Pool quality"
[473] "\t\t"
[474] " Ex\tExcellent"
[475] " Gd\tGood"
[476] " TA\tAverage/Typical"
[477] " Fa\tFair"
[478] " NA\tNo Pool"
[479] "\t\t"
[480] "Fence: Fence quality"
[481] "\t\t"
[482] " GdPrv\tGood Privacy"
[483] " MnPrv\tMinimum Privacy"
[484] " GdWo\tGood Wood"
[485] " MnWw\tMinimum Wood/Wire"
[486] " NA\tNo Fence"
[487] "\t"
[488] "MiscFeature: Miscellaneous feature not covered in other categories"
[489] "\t\t"
[490] " Elev\tElevator"
[491] " Gar2\t2nd Garage (if not described in garage section)"
[492] " Othr\tOther"
[493] " Shed\tShed (over 100 SF)"
[494] " TenC\tTennis Court"
[495] " NA\tNone"
[496] "\t\t"
[497] "MiscVal: $Value of miscellaneous feature"
[498] ""
[499] "MoSold: Month Sold (MM)"
[500] ""
[501] "YrSold: Year Sold (YYYY)"
[502] ""
[503] "SaleType: Type of sale"
[504] "\t\t"
[505] " WD \tWarranty Deed - Conventional"
[506] " CWD\tWarranty Deed - Cash"
[507] " VWD\tWarranty Deed - VA Loan"
[508] " New\tHome just constructed and sold"
[509] " COD\tCourt Officer Deed/Estate"
[510] " Con\tContract 15% Down payment regular terms"
[511] " ConLw\tContract Low Down payment and low interest"
[512] " ConLI\tContract Low Interest"
[513] " ConLD\tContract Low Down"
[514] " Oth\tOther"
[515] "\t\t"
[516] "SaleCondition: Condition of sale"
[517] ""
[518] " Normal\tNormal Sale"
[519] " Abnorml\tAbnormal Sale - trade, foreclosure, short sale"
[520] " AdjLand\tAdjoining Land Purchase"
[521] " Alloca\tAllocation - two linked properties with separate deeds, typically condo with a garage unit\t"
[522] " Family\tSale between family members"
[523] " Partial\tHome was not completed when last assessed (associated with New Homes)"